FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction

Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC’s robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC’s qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC, which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.


Introduction
Accurately quantifying the effect of genetic variants on the fitness of the encoded proteins is one of the open challenges in biology which, if resolved, would have a tremendous impact on the understanding and treatment of genetic diseases [1][2][3].The experimental approaches commonly used to quantify variant effects include different mutagenesis experiments [4][5][6][7] and largescale exome screening approaches [8,9].However, these remain expensive and time consuming, and given the ever increasing amount of genetic data that is being generated, the number of variants of unknown significance (VUS) that are waiting to be characterized keep growing [10].Moreover, the genetic bases of the majority of rare diseases are still not deciphered [11], and this is even more true for complex diseases such as cancer [12].New complementary approaches are thus needed to interpret and classify these VUS and, more generally, to gain novel insights into these matters.
In the last two decades, many computational tools have been developed to predict the phenotypic effect of genetic variants [13][14][15][16][17][18][19][20][21][22][23][24].They are mainly based on evolutionary features combined using machine learning techniques.The most recent predictors such as [14,22,24] take advantage of the advent of deep learning approaches, as enough experimental data has become available to train complex models for fitness prediction [7].These methods could in principle help accelerate the discovery of clinically relevant variants and their molecular effect, but their low accuracy and poor generalization properties are major obstacles for having a strong impact on clinical decision.In addition, black-box machine learning models do not contribute to improve our understanding of pathogenic mechanisms.
Currently, the gold standard to assess the performance of fitness prediction methods is the blind communitywide experiment called Critical Assessment of Genome Interpretation (CAGI) [25][26][27], which evaluates predictors on unpublished data.CAGI allows for an unbiased assessment of the methods as well as the identification of their strengths and weaknesses.Moreover, it provides guidelines on how to translate computational predictions into clinical practice.
In this paper we present our new method, FiTMuSiC, which we used in the recent CAGI6 experiment to predict the fitness of hydroxymethylbilane synthase (HMBS) variants.We begin with a presentation and discussion of our computational approach and of its performances in CAGI6.We then showcase additional results of our method on clinically relevant variants.Our results show that FiTMuSiC achieves very good performances when applied to unseen data, which demonstrates that simple linear combination models can actually perform as well as more complex deep learning-based models such as AlphaMissense [24].

Features
We briefly describe the features used by our method, which are of two kinds: structural and evolutionary.Structural features use the 3-dimensional (3D) structure of the wild-type protein as input.They include: • Relative solvent accessibility (RSA).It is defined as the ratio (in %) between the solvent accessible surface area of a residue in its given 3D structure and in a Gly-X-Gly tripeptide extended conformation; it is computed by an in-house program [28].• PoPMuSiC (PoP) [29].This computational tool predicts the change in protein thermodynamic stability upon point mutations ( G ) using the 3D structure of the target protein as input.It is based on the formalism of statistical potentials [30], with the energy values and RSA used as features in an artificial neural network.
• MAESTRO (MAE) [31].This tool also predicts the G based on the protein 3D structure.It uses con- tact potentials as features, as well as some biophysical properties of the mutated and wild-type residues such as hydrophobicity and isoelectric point.
• SNPMuSiC (SNP) [16].It is a predictor of variant deleteriousness based on structural and evolutionary features.Its evolutionary part is the PROVEAN algorithm [17], and its structural part consists of statistical potentials and RSA appropriately combined with artificial neural networks (ANN).We used here the structural part only, since PROVEAN is used by FiT-MuSiC as a separate feature.
FiTMuSiC also includes four evolutionary features.To compute them, we generated a multiple sequence alignment (MSA) of the target sequence using JackHMMER [32] (with database UniRef90 [33], one iteration and an E-value threshold of 0.01).The evolutionary features are: • PROVEAN score (PVS) [17].It is a pure evolutionary tool that predicts the functional effect of variants.We used a re-implemented in-house version of the program which has some small differences with respect to the original version.Namely, it uses the pairwise alignment of the wild-type with the homologous sequences to calculate the alignment scores of the variants, rather than realigning them for each variant.• Conservation Index (CI) [34].It is calculated from f i (a) and f(a), the regularized frequencies of amino acid a at position i in the MSA and in the full MSA, respectively, which are computed as: where c i (a) and c(a) are the number of occurrences of a at position i and in the full MSA, respectively, m is the depth of the MSA and N its length.The pseudocount parameter θ is set to 0.01 and defines the strength of the regularization; 21 is the number of possible states (20 amino acids and 1 gap).The CI score is calculated as: where A is the set of 20 standard amino acids.• Log-odd ratio score (LOR) [35].The log-odd ratio of observing the wild-type amino acid wt with respect (1) , to the mutated amino acid mt at position i is defined as: • pyCoFitness score (PYF) [36].This score is obtained through a method that infers a coevolutionary model from the MSA using a pseudo-likelihood maximization direct coupling analysis approach [37], and employs the inferred model to compute the change in fitness due to the variant.

Model structure and training
The FiTMuSiC model is a simple linear combination of the eight features listed above.The mathematical expression of the model is: where α i ( i = 1, . . ., 9 ) are free parameters that were identified based on a training set of deep-mutagenesis scanning data on three proteins: SUMO-conjugating enzyme UBC9 (UBE2I), small ubiquitin-related modifier 1 (SUMO1) and thiamin pyrophosphokinase 1 (TPK1) [38].Structural features were computed using models from the AlphaFold Protein Structure Database [39].
The scale convention of FiTMuSiC values is the following: a value of 1 means equal fitness for wild-type and mutant; a value of 0 or below means the mutant is not fit at all; a value larger than 1 means that the mutant is fitter than the wild-type.

Additional models submitted to CAGI6
In addition to FiTMuSiC, we submitted the predictions of two other models to the CAGI6 challenge.The first is a simple rescaling of the SNPMuSiC score (SNP): where the numerical factors β 1 and β 2 were chosen to rescale the SNPMuSiC values and were identified on the fitness training set described in the previous subsection.
Although stability and fitness are imperfectly correlated [40], we also submitted a prediction model based on a rescaling of the score of the thermodynamic stability predictor PoPMuSiC (POP): where the parameters γ 1 and γ 2 were identified on the same training set as the other models.The ReLU func- tions bound the output between 0 and 1. (3)

Model interpretation
To give information about the molecular effect of variants, FiTMuSiC provides four scores in addition to the global fitness of the variants.The first is the RSA of the mutated residue, which provides information on its spatial location in the 3D structure.The second is the z-score Z defined as: where µ and σ represent the mean and standard devia- tion over all mutations on the given protein, respectively.Negative z-scores correspond to mutants that are less fit than average mutants; positive z-scores indicate mutants that are fitter than average mutants, with very positive values corresponding to mutants fitter than the wild-type.
The last two scores, Z str and Z evo , give information about the extent to which the structural features (SNP, POP, MAE) and evolutionary features (CI, LOR, PVS, PYF) contribute to the global fitness of the considered variant.Defining the structural (STR) and evolutionary (EVO) contributions to the fitness as: their z-scores Z str and Z evo are expressed as: Negative Z str values correspond to mutations that desta- bilize the structure more than average mutations; positive Z str values indicate mutations that are less destabilizing than average mutations or are even stabilizing.Negative Z evo values correspond to mutations into residues that are rarely to never observed at that position across evolution or, more precisely, that are evolutionary unfavorable in the sequence context; positive Z evo values indicate mutations into residues that are evolutionary favorable.

Predicting fitness of HMBS variants
HMBS, also known as porphobilinogen deaminase, is an enzyme involved in the heme biosynthesis pathway, and more specifically in the conversion of porphobilinogen into heme precursor hydroxymethylbilane [41].Mutations in this gene have been associated with acute intermittent porphyria (AIP), which is a rare metabolic disease (7) with life-threatening neurovisceral attacks that require frequent hospitalization of patients [42].As almost one third of HMBS variants annotated in the ClinVar database [43] are VUS, saturation mutagenesis experiments using high-throughput yeast complementation assays have recently been performed to estimate the fitness of HMBS variants and better understand the pathogenic mechanisms leading to AIP [44].This data was unpublished at the time of the CAGI6 experiment and was used as blind fitness values to assess predictors.Among the 5963 HMBS single-site missense mutations with experimental fitness values from [44], the CAGI6 assessors discarded hyper-complementing mutations (with experimental scores above 1.36), leaving a final evaluation dataset of 5811 mutations [27].Indeed, it has previously been reported by the authors of the experiments that such variants displaying increased fitness in yeast assays could be mostly disadvantageous in human [38,44].
We applied our prediction models FiTMuSiC (Eq.( 4)), SNPMuSiC (Eq.( 5)) and PoPMuSiC (Eq.( 6)) to the HMBS target.We also report the results of the two other top-performing methods among the 11 teams participating in the challenge, i.e.CalVEIR and ELAPSIC (called team 10_5 and 5_1 in [27]).Additionally, we provide the results of six widely used methods for deleteriousness prediction, i.e.FATHMM [13], PROVEAN [17], DEO-GEN2 [15], PolyPhen − 2.0 [19], EVE [14] and MutPred2 [23] as well as two recently developed deep-learning based predictors, Sequence UNET [22] and AlphaMissense [24].To ensure consistency with the metrics provided by the CAGI6 HMBS challenge, all methods were benchmarked on the same dataset of 5811 mutations.The performance of the predictors was assessed by three types of correlations (i.e.Pearson correlation, and Spearman and Kendall rank correlations), and the root mean squared deviation (RMSD).The results are given in Table 1.
Note that the current version of FiTMuSiC (available on our webserver) slightly outperforms the version used for the CAGI6 HMBS challenge due to a small implementation modification.Namely, we now consider the SNP and PVS terms separately (as described in Methods), whereas they were aggregated into a single term in the previous version.The Kendall, Spearman and Pearson correlations improved from (0.30, 0.43, 0.42) to (0.31, 0.45, 0.45), respectively, between the first and second versions.However, to ensure the blind nature of the challenge, we presented in the table the performances of the older FiTMuSiC version.
Among CAGI6 participants, FiTMuSiC performs as well as the other two best performing predictors, ELAPSIC and CalVEIR with very similar performance metrics.CalVEIR shows the best results in rank-based metrics, ELAPSIC in Pearson correlation and FiTMuSiC in RMSD.These three predictors all perform significantly better than the other 8 teams participating in CAGI6 [27].They also perform significantly better than the other methods tested (FATHMM, PROVEAN, DEOGEN2, PolyPhen-2, Sequence UNET and MutPred2), except for EVE and AlphaMissense.We observe that FiTMu-SiC outperforms EVE in rank-based metrics but not in Pearson correlation and that, conversely, FiTMuSiC outperforms AlphaMissense in Pearson correlation but not in rank-based metrics.Overall, these five best performing methods display very comparable scores and their respective ranking depends on the metric considered.
We also wish to underline the good performances of the SNPMuSiC deleterious variant predictor [16], which only slightly underperforms the best methods.In contrast, PoPMuSiC [29], which predicts stability changes upon mutations, does not work so well.This is not surprising given deleteriousness and fitness are very well correlated, while stability and fitness are less so.For example, all functional residues are highly important for fitness while very poorly optimized for stability [40].
The performance of the tested methods can be considered as good considering that the HMBS data was not seen by any of the methods.However, there is still room for improvement as the Pearson correlation coefficient of all methods is below 0.5.Note, however, that the noisiness of deep-mutagenesis datasets (with both random and systematic errors) puts an upper bound to the

Feature analysis and model interpretation
It is well known that enzymes exhibit an activity-stability trade-off: residues in catalytic regions are optimized for functional reasons and less or not at all for stability, while other residues are very important for protein folding and stability and play little to no role in function [40].FiTMu-SiC can help in distinguishing these functional and structural contributions.Indeed, it outputs the z-scores Z str and Z evo (Eqs.[10][11] which inform us about the extent to which structural and/or evolutionary features contribute to protein fitness, and provides us with a molecularlevel understanding of variant effects.It also gives us information about the RSA of the mutated residues, and thus about their location in the protein.
We focused here on three functionally or structurally important residue groups of HMBS, which are structurally represented in Fig. 1 and colored according to their average per-residue z-score values Z evo and Z str .Paired Z str and Z evo values of all single-site mutations are plotted in Fig. 2, with the mutations of the selected residue groups highlighted.
The region around the catalytic site of HMBS is represented in Figs.1a,b and 2a.The catalytic residues (K98, D99, R149, R150, R167, R173 and C261) were identified by aligning the sequences of the considered human HMBS and of Escherichia coli HMBS, and by mapping the seven catalytic residues of the latter [45] annotated in the Catalytic Site Atlas [46].These residues are thus functionally important, well conserved and very specific.As expected, mutating them results in very negative Z evo values (between −2.41 and −0.59 ), which reflects drastic reduction or loss of function.In contrast, they contribute little to structural stability, as seen from the predicted Z str values centered around zero (between −1.43 and +1.04).
The second region considered is the salt bridge between the negatively charged residue E250 and the positively charged residue R116 (Figs. 1c, d and 2b).It is a highly specific interaction that has been shown to play an essential role in the enzyme's fold by molecular dynamics simulations [44].The Z evo and Z str values of these two residues are predicted to be negative on the average ( −1.41 and −0.43 respectively), indicating fitness reduc- tion upon mutations.Z evo is negative for all mutations ( ≤ −0.72 ), whereas Z str is only negative on the average (between −1.53 and +0.37 ).The high specificity of the interaction gives a particularly strong evolutionary signal, whereas the stabilizing effect of salt bridges is less marked compared to other interactions.
Finally, the hydrophobic cluster of the three residues V124, I186 and L193 (Figs. 1e, f and 2c) located in the core of the protein is very important for the stability of the protein fold.It thus shows strongly negative Z str val- ues, with some exceptions that correspond to mutations from one hydrophobic residue into another.In contrast, this cluster plays no direct role in the protein's enzymatic activity and, moreover, hydrophobic interactions have low specificity and are often substituted with other hydrophobic residues across evolution.This explains the large width of the Z evo distribution (between −1.52 and +1.64 ), and its only weakly negative average value ( −0.70 ).On the other hand, Z str values are also sparse (between −3.05 and +0.38 ) but are more shifted towards negative values (average of −1.60).
Comparing the coefficients in Eqs. ( 8) and ( 9) when all features of the linear regression are normalized by their standard deviation, we found the contribution of the evolutionary features to the final score to be about 3 times greater than that of structural features, which indicates that evolutionary terms hold a relatively larger predictive power.However, it is the combination of both contributions that leads to the highest precision and structural terms thus improve the detection of deleterious variants.For instance, most mutations of residue L244 have very low experimental fitness a display a largely negative Z str but a positive Z evo .We postulate that the deleterious nature of these mutations has not been detected by evolutionary features due to the relatively low frequency of leucine in the MSA at this position (about 0.02).Another advantage of the structural terms is that they are reliable on proteins or protein regions with low evolutionary information (resulting in low-depth MSAs regions), such as de novo designed proteins.Indeed, none of the structural terms rely on evolutionary information.
In summary, the combination of both structural and evolutionary terms makes it possible to interpret whenever the deleterious effect of a mutation is attributed to a loss of function or to a perturbation of the protein fold.Since evolution and structure are related, it is no surprise that we often observe correlated Z str and Z evo values.However, this correlation is limited (Pearson correlation Fig. 2 Scatter plots of paired Z str and Z evo values for all single-site mutations in HMBS.Mutations of a the catalytic residues K98, D99, R149, R150, R167, R173 and C261, b the salt bridge residues E250 and R116 and c the hydrophobic cluster residues V124, I186 and L193 are highlighted in purple of 0.40).As a matter of fact, there are a lot of counterexamples where Z str and Z evo have opposite signs, as seen in Fig. 2.This reflects the fact that the evolutionary and structural components of fitness are complementary, and that combining them into a single model increases both its accuracy and interpretability.

Prediction of HMBS gain-of-function variants
Variants displaying an increased fitness compared to the wild-type, sometimes referred as gain-of-function (GoF) variants are known to be difficult to predict and to interpret [47].Furthermore, as pointed out above, very high fitness values in yeast assays tend to be deleterious in human, making their interpretation even more ambiguous.FiTMuSiC, as well as the other assessed predictors, cannot be used to accurately detect GoF variants.However, we still note that the set of variants with experimental fitness above 1.1 (about one tenth of all HMBS mutations) have both positive Z evo and Z str values (0.51 and 0.37, respectively).In addition, when comparing the average z-score of the GoF variant predictions, FiTMu-SiC displays the highest value (0.54) among all tested methods.

FiTMuSiC application to HMBS variant pathogenicity and activity
Fitness predictors are expected to play a crucial role in the classification and interpretation of genetic variants by providing complementary information to the experimental characterizations [48].It has however to be noted that the experimental HMBS fitness values of the CAGI6 challenge come from a deep mutagenesis experiment that uses functional complementation yeast assays, which cannot fully reflect the complex mechanisms underlying variants' pathogenicity and activity.
In this context, we assessed all the predictors considered as well as the experimental yeast assay data [44] on their ability to distinguish clinically annotated pathogenic and benign variants in humans.To that end, we collected the 53 pathogenic or likely pathogenic variants in HMBS that are related to AIP and the 13 benign or likely benign variants from ClinVar [43].The metrics we used to assess the methods' performances are sensitivity, specificity and balanced accuracy (BACC), for which we used the default prediction thresholds provided by the methods (and 0.5 for FiTMuSiC), as well as a threshold-independent metric, the area under the receiver operating characteristic curve (AUC-ROC).We reported all performances in Table 2.
We observe that FiTMuSiC predicts with very high accuracy the pathogenicity of the variants with a BACC of 0.94 and an AUC-ROC of 0.98 only slightly outperformed by AlphaMissense with a BACC of 0.95 and an AUC-ROC of 0.99.It performs better than all other computational methods and also, notably, than the experimental high-throughput fitness data obtained by yeast complementation assays to evaluate variant pathogenicity.We found that some of the computational methods tested are heavily biased towards pathogenic variants, as for example PolyPhen-2 and FATHMM.This can be explained by the choice of the threshold values proposed by their authors.They have thus a very poor specificity and predict very few neutral variants.FiTMuSiC does not suffer from this bias and reaches almost perfect accuracy in identifying neutral variants.Note that EVE also shows good performances which are only slightly less accurate than FiTMuSiC.
Table 2 Performance on 66 HMBS variants with clear clinical annotations taken from ClinVar [44], using all predictors assessed as well as experimental fitness data obtained by yeast complementation assays [44].The best score for each metric is indicated in bold  As an additional verification of FiTMuSiC robustness, we checked if it is able to predict the effect of variants on HMBS in vitro activity.We reported in Table 3 the correlations between the results of the predictors or high-throughput experiments and the experimentally measured activity of 35 variants described in [49].These results show that FiTMuSiC performs very well.It even outperforms in Pearson correlation the experimental fitness data from [44] and is only outperformed by AlphaMissense.

FiTMuSiC application to human glucokinase
To further test the robustness of FiTMuSiC, we applied it to another blind test set containing experimental highthroughput fitness data of single-site variants in human glucokinase (GCK).This enzyme plays a key role in insulin secretion in pancreatic β-cells: it catalyzes the first step of the glycolysis by transforming glucose into glucose-6-phosphate [50].Inactivating GCK variants were related to maturity-onset diabetes of the young as well as to permanent neonatal diabetes mellitus [50,51].Hyperactive GCK variants are also deleterious and lead to persistent hyperinsulinemic hypoglycemia of infancy.
In order to shed light on the molecular effects that lead to these disorders, the GCK activity of 8570 single-site variants have been experimental assessed using functional complementation yeast assays [52].We used this set of variants as independent test set to assess the fitness predictors.To ensure homogeneity between this dataset and data provided for HMBS, we floored all negative fitness values to zero and excluded all values with standard error exceeding 0.3, as was done in the experimental data from [44].This gives a final number of 6862 missense mutations; note that the experimental data appears to be noisier on GCK than on HMBS, as experiments on the latter were repeated twice.We show the performances of FiTMuSiC and other computational tools on GCK in Table 4. FiTMuSiC is among the top ranked predictors on this additional test set, with performance metric values in line with those of the HMBS benchmark.
We also evaluated the ability of the methods to classify deleterious and benign GCK variants that are defined based on clinical annotations.For that purpose, we curated a set of variants in GCK from ClinVar [43] with clear clinical interpretation.This led us to a collection 69 pathogenic or likely pathogenic variants, and 3 benign or likely benign variants.The very low number of benign variants and the bias of predictors towards deleterious variants make this test case relatively easy, and most methods thus reach very high scores: five methods have an AUC-ROC of at least 0.97 (Table 5).FiTMuSiC also shows good performance with a BACC of 0.89 and an AUC-ROC of 0.99.Due to the strong imbalance of this test set, we suggest to consider these results with caution.
It has to be noted that the use of experimental fitness data from complementation yeast assays to predict deleteriousness does not perform very well for GCK variants (Table 5).The BACC and AUC-ROC values are even lower than in the case of HMBS.Some reported pathogenic variants such as V62M, T65I and H137R, seem to be benign in the experimental fitness map.Their deleteriousness has been suggested to be related to effects such as modest structural instability which are not captured by the assay [52].This observation underlines the importance of reliable and robust prediction methods to complement experimental data for annotation and interpretation of variants.

Webserver
In order to make FiTMuSiC readily available to the scientific community, we have developed an easy-to-use webserver at http:// babyl one.ulb.ac.be/ FiTMu SiC/.Users need to input a 3D structure of the target protein in one of three ways: 1. Provide its PDB ID if it is available in the Protein Data Bank (PDB) [53]; the structure is automatically retrieved.2. Provide its UniProt ID; the corresponding AlphaFold DB structure [39] is then retrieved.3. Provide a personal structure in PDB format (.pdb).
Since FiTMuSiC provides results on a per-chain basis, users need to select which chain they want the results for.Note that FiTMuSiC only outputs the results of a single chain, but the structural components of the model take into account all the chains contained in the structure file when computing the fitness score.Therefore, we recommend that users provide protein structures that correspond to biological units, especially when dealing with multimers.
Once the chain has been selected and submitted, the computation starts.Depending on the length of the query protein and the depth of its MSA, users should expect the computation to be completed in a few minutes for short proteins to a few hours for very long proteins.Once the computation is done, a CSV file with the results is sent to the email address provided during the submission.This file contains the RSA of all residues in the protein and the predicted fitness scores for all possible single-site variants.The last four columns contain fitness score information, i.e. the raw FiTMuSiC score and the z-scores Z , Z evo and Z str (Eqs.4,7,10,11).More information about the webserver and its usage is available on the help page (http:// babyl one.ulb.ac.be/ FiTMu SiC/ help.php).

Conclusion
We presented here FiTMuSiC, our new computational model based on a combination of structural and (co) evolutionary information, which predicts the impact of single-site amino acid substitutions on protein fitness.We applied it to predict variants in HMBS, one of the targets of the CAGI6 challenge.It was rated as one of the top three predictors by the CAGI6 assessors [27].The strengths of FiTMuSiC can be summarized as follows: • It is based on a simple model, which is less prone to overfitting and biases towards the training set than machine learning models with thousands of parameters.This allows for very good performances on blind, independent test sets as we have shown here for variants in HMBS and GCK.• It retains interpretability by providing the Z evo and Z str scores which allow distinguishing between vari- ants that impact more on function or on stability.• It is available through an easy-to-use webserver, which allows users to get FiTMuSiC results in a simple way even without bioinformatics background.
For all these reasons, FiTMuSiC is of interest to the large community of scientists interested in the prioritization, classification and interpretation of genetic variants.Moreover, it represents a reliable, complementary and cheaper approach compared to experimental methods.

Fig. 1
Fig. 1 Contributions of structural and evolutionary features to HMBS fitness, represented by Z str and Z evo , respectively.Negative z-scores (indicating mutations less fit than average mutations) are in red, close to zero scores in white and positive scores (indicating mutations fitter than average mutations) in blue.a, b Catalytic region, with the catalytic residues K98, D99, R149, R150, R167, R173 and C261 shown in sticks, and the substrate in green; c, d Salt bridge partners E250 and R116 shown in sticks; e, f Cluster of the three buried hydrophobic residues V124, I186 and L193 shown in sticks

Table 1
[27]ess prediction results of the benchmarked methods on the 5811 variants used in the CAGI6 HMBS challenge[27].The best score for each metric is indicated in boldThe performances were taken from the assessors' results for CAGI6 participants, while for the other methods we evaluated the performances ourselves.EVE's predictions are available for only 5152/5811 variants; missing values where set to the median

Table 3
[44]elation coefficients between experimental activity on 35 HMBS variants measured in[49]and the fitness values obtained by the assessed predictors and by experimental yeast complementation assays[44].The best score for each metric is indicated in bold

Table 4
[52]elations between fitness values obtained by highthroughput experiments using functional complementation yeast assays[52]on 6862 GCK variants and those predicted by all the methods assessed.The best score for each metric is indicated in bold *EVE's predictions are available for only 6414/6862 GCK variants; missing values where set to the median

Table 5
[52]ormance on 72 GCK variants with clear clinical annotations taken from ClinVar[43], using all predictors assessed as well as experimental fitness data obtained by yeast complementation assays[52].The best score for each metric is indicated in bold